Skip to main content

Java Threads & Concurrency: Understanding OS-Level Implementation

Table of Contents

  1. Thread Fundamentals
  2. OS-Level Thread Management
  3. CPU and Thread Execution
  4. Memory and RAM Interaction
  5. Practical Examples

Thread Fundamentals

What is a Thread?

A thread is the smallest unit of execution that can be scheduled by the operating system. Think of it as a lightweight process that shares memory with other threads in the same process.

Key Concept: When you create a Java thread, you're actually requesting the OS to create a native thread.

// Simple thread creation
Thread thread = new Thread(() -> {
System.out.println("Running in: " + Thread.currentThread().getName());
});
thread.start(); // This triggers OS-level thread creation

OS-Level Thread Management

The Journey from Java to OS

When you call thread.start() in Java, here's what happens underneath:

Java Application (JVM)

JVM Thread API

Native Thread Library (pthreads on Linux, Windows Threads on Windows)

Operating System Kernel

Scheduler assigns thread to CPU core

Thread Models

1:1 Model (Java uses this)

  • One Java thread = One OS thread
  • Each Java thread maps directly to a kernel thread
public class ThreadMappingExample {
public static void main(String[] args) {
// Creating 3 Java threads = 3 OS threads
for (int i = 0; i < 3; i++) {
Thread t = new Thread(() -> {
System.out.println("Thread ID: " + Thread.currentThread().threadId());
System.out.println("Native thread ID: " +
ProcessHandle.current().pid());
});
t.start();
}
}
}

CPU and Thread Execution

How CPU Executes Threads

Single Core CPU:

Time Slice 1: Thread A executes
Time Slice 2: Thread B executes (context switch)
Time Slice 3: Thread A executes (context switch)
Time Slice 4: Thread C executes (context switch)

Multi-Core CPU:

Core 1: Thread A
Core 2: Thread B } All execute simultaneously
Core 3: Thread C
Core 4: Thread D

Context Switching

When the OS switches from one thread to another, it must:

  1. Save current thread state (registers, program counter, stack pointer) → RAM
  2. Load next thread state from RAM → CPU registers
  3. Resume execution
public class ContextSwitchExample {
public static void main(String[] args) {
// With 1000 threads on 8 cores, expect lots of context switching
for (int i = 0; i < 1000; i++) {
new Thread(() -> {
// CPU time slicing happens here
for (int j = 0; j < 1000000; j++) {
Math.sqrt(j); // CPU-intensive work
}
}).start();
}
}
}

Cost of Context Switching:

  • Save/restore CPU registers: ~1-2 microseconds
  • Cache invalidation (CPU cache needs to reload data)
  • TLB (Translation Lookaside Buffer) flush

Memory and RAM Interaction

Thread Memory Layout

Each thread has:

┌─────────────────────────────────┐
PROCESS MEMORY SPACE
├─────────────────────────────────┤
Heap (Shared by all threads) │ ← Objects created with 'new'
├─────────────────────────────────┤
Method Area (Shared) │ ← Class metadata, static variables
├─────────────────────────────────┤
Thread 1 Stack (Private) │ ← Local variables, method calls
├─────────────────────────────────┤
Thread 2 Stack (Private)
├─────────────────────────────────┤
Thread 3 Stack (Private)
└─────────────────────────────────┘

Memory Visibility Problem

public class MemoryVisibilityExample {
// Without volatile, changes might not be visible across threads
private static boolean flag = false;

public static void main(String[] args) throws InterruptedException {
// Thread 1: Reads flag
Thread reader = new Thread(() -> {
while (!flag) {
// CPU might cache 'flag' value in register
// Never reads updated value from RAM!
}
System.out.println("Flag is now true!");
});

// Thread 2: Writes flag
Thread writer = new Thread(() -> {
try {
Thread.sleep(1000);
flag = true; // Written to CPU cache, maybe not RAM yet
System.out.println("Flag set to true");
} catch (InterruptedException e) {
e.printStackTrace();
}
});

reader.start();
writer.start();
}
}

CPU Cache and Memory Hierarchy

CPU Core
├─ L1 Cache (32-64 KB, ~1 ns access)
├─ L2 Cache (256 KB, ~3 ns access)
└─ L3 Cache (Shared, 8-32 MB, ~12 ns access)

Main RAM (GB, ~100 ns access)

Why This Matters:

public class CacheCoherenceExample {
private static int sharedCounter = 0;

public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
// Core 1 reads sharedCounter into its cache
// Increments it
// Writes back (eventually)
sharedCounter++;
}
});

Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
// Core 2 also reads sharedCounter into its cache
// Both cores have different cached values!
sharedCounter++;
}
});

t1.start();
t2.start();
t1.join();
t2.join();

// Expected: 200000, Actual: Less (lost updates)
System.out.println("Counter: " + sharedCounter);
}
}

Practical Examples

Example 1: CPU-Bound Task

public class CPUBoundExample {
public static void main(String[] args) {
int cores = Runtime.getRuntime().availableProcessors();
System.out.println("CPU Cores: " + cores);

// Creating threads = number of cores is optimal for CPU-bound tasks
for (int i = 0; i < cores; i++) {
Thread t = new Thread(() -> {
// This thread gets a dedicated core
long sum = 0;
for (long j = 0; j < 1_000_000_000L; j++) {
sum += j;
}
System.out.println("Sum: " + sum);
});
t.start();
}
}
}

What Happens:

  1. JVM creates 8 threads (on 8-core CPU)
  2. OS scheduler assigns 1 thread per core
  3. Each core executes its thread with minimal context switching
  4. CPU utilization: ~100%

Example 2: I/O-Bound Task

public class IOBoundExample {
public static void main(String[] args) {
// I/O-bound: Can create many more threads than cores
for (int i = 0; i < 1000; i++) {
Thread t = new Thread(() -> {
try {
// Thread blocks, OS removes from CPU
Thread.sleep(1000); // Simulates I/O wait
// Thread wakes, OS schedules back to CPU
System.out.println("Done waiting");
} catch (InterruptedException e) {
e.printStackTrace();
}
});
t.start();
}
}
}

What Happens:

  1. Thread calls sleep() → moves to WAITING state
  2. OS removes thread from CPU scheduler
  3. CPU is free for other threads
  4. After sleep, thread moves to RUNNABLE → OS schedules it back

Example 3: Proper Synchronization

public class SynchronizedExample {
private static int counter = 0;
private static final Object lock = new Object();

public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
synchronized (lock) {
// CPU acquires lock (atomic operation at hardware level)
// Memory barrier: flushes CPU cache to RAM
counter++;
// Memory barrier: ensures write is visible
// CPU releases lock
}
}
});

Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
synchronized (lock) {
counter++;
}
}
});

t1.start();
t2.start();
t1.join();
t2.join();

System.out.println("Counter: " + counter); // Always 200000
}
}

OS-Level Operations:

  1. Thread requests lock → OS/JVM checks lock status
  2. If locked: Thread goes to BLOCKED state (not using CPU)
  3. Lock owner releases → OS wakes waiting thread
  4. synchronized creates memory barriers (CPU instruction)
  5. Cache coherence protocol ensures all cores see updates

Example 4: Understanding Thread States

public class ThreadStatesExample {
public static void main(String[] args) throws InterruptedException {
Object lock = new Object();

Thread t = new Thread(() -> {
synchronized (lock) {
try {
System.out.println("RUNNABLE -> CPU executing");
Thread.sleep(1000);
System.out.println("TIMED_WAITING -> Off CPU, in RAM");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});

System.out.println("NEW: " + t.getState()); // Thread object in heap
t.start();
System.out.println("RUNNABLE: " + t.getState()); // OS scheduled
Thread.sleep(500);
System.out.println("TIMED_WAITING: " + t.getState()); // Off CPU
t.join();
System.out.println("TERMINATED: " + t.getState()); // OS cleaned up
}
}

Thread Lifecycle at OS Level

NEW (Java object in heap)
start()
RUNNABLE (OS ready queue)
OS Scheduler
RUNNING (Executing on CPU core)
sleep()/wait()/I/O
WAITING/TIMED_WAITING (Off CPU, in RAM)
notify()/interrupt()/I/O complete
RUNNABLE (Back to OS ready queue)
Execution completes
TERMINATED (OS cleans up resources)

Key Takeaways

  1. Java Thread = OS Thread: Direct 1:1 mapping with native threads
  2. Context Switching: Expensive operation, save/restore state from RAM
  3. CPU Cores: Limit true parallelism (8 cores = max 8 threads running simultaneously)
  4. Memory Visibility: Changes in one core's cache might not be visible to others without synchronization
  5. Thread Stack: Each thread gets private stack space in RAM (~1 MB default)
  6. Shared Heap: All threads share heap memory, need synchronization
  7. OS Scheduler: Decides which thread runs on which core and when

Understanding these concepts helps you write efficient concurrent programs and debug threading issues!